Implementing Urdu Grammar as Open Source Software
نویسندگان
چکیده
Urdu is a challenging language because of, first, its Perso-Arabic script, second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia and third, its pragmatically neutral constituent order (SOV Subject Object Verb). Today, the state of art technology to write grammars (morphology + syntax) is to use specialpurpose languages based on finite-state technology. These languages are mostly based on regular expressions. In our opinion, these languages are still close to the machine code. Therefore, we emphasis on using a higher level language to capture the linguistic abstraction. Then that higher level code should be translated into finite state code by some tool if required.
منابع مشابه
A Computational Classification of Urdu Dynamic Copula Verb
In this paper, a lexical functional grammar for an automatic classification of Urdu copula verb hO (be/become) is presented according to linguistic theories. A test suite of sentences containing almost all different conjugation forms of copula verb is extracted from a raw corpus. It is tried to keep only the cases of copular construction because the copula verb hO is very much dynamic in nature...
متن کاملAn Open Source Urdu Resource Grammar
We develop a grammar for Urdu in Grammatical Framework (GF). GF is a programming language for defining multilingual grammar applications. GF resource grammar library currently supports 16 languages. These grammars follow an Interlingua approach and consist of morphology and syntax modules that cover a wide range of features of a language. In this paper we explore different syntactic features of...
متن کاملUrdu Summary Corpus
Language resources, such as corpora, are important for various natural language processing tasks. Urdu has millions of speakers around the world but it is under-resourced in terms of standard evaluation resources. This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu...
متن کاملUsing An Open-Source Unification-Based System For CL/NLP Teaching
We demonstrate the open-source LKB system which has been used to teach the fundamentals of constraint-based grammar development to several groups of students. 1 Overview of the LKB system The LKB system is a grammar development environment that is distributed as part of the open source LinGO tools (http://wwwcsli.stanford.edu/ ̃aac/lkb.html and http://lingo.stanford.edu, see also Copestake and F...
متن کاملDevelopment of an Open Source Urdu Screen Reader for Visually Impaired People
Speech technology has enabled computer accessibility for users with visual impairments but the language barrier poses a great challenge. This project is an effort to overcome the hurdles faced by visually impaired people, in terms of language barrier, by providing them access to digital information through software which can communicate with them in Urdu. A survey was conducted in schools for b...
متن کامل